Search CORE

HKU Scholars Hub

Estimation of alternative splicing isoform frequencies from RNA-Seq data

Author: A Mortazavi
A Oshlack
A Roberts
Alex Zelikovsky
B Jackson
B Langmead
B Li
B Paşaniuc
BE Howard
C Trapnell
C Trapnell
CP Ponting
D Hiller
E Wang
H Jiang
H Richard
I Birol
Ion I Măndoiu
J Bloom
J Clarke
J Eid
J Feng
KD Hansen
M Anton
M Griffith
M Guttman
M Sultan
Marius Nicolae
P Carninci
Serghei Mangul
Team MGC Project
V Lacroix
Y She
Y Surget-Groba
Z Wang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging. Results In this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at <url>http://dna.engr.uconn.edu/software/IsoEM/</url>. Conclusions Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.</p

ScholarWorks @ Georgia State University

Springer - Publisher Connector

Dagstuhl Research Online Publication Server

Methods to study splicing from high-throughput RNA Sequencing data

Author: A Ameur
A Bhasi
A Dobin
A Mortazavi
A Oshlack
A Roberts
A Roberts
AM Mezlini
AN Brooks
B Jackson
B Kakaradov
B Langmead
B Li
B Li
BJ Haas
BJ Haas
C Trapnell
C Trapnell
C Trapnell
D Hiller
D Singh
DL Wood
DW Bryant
E Eyras
E Lee
E Turro
ET Wang
F Birzele
F Bona De
F Denoeud
F Tang
G Robertson
G Xu
GA Sacomoto
GR Grant
GS Slater
H Bao
H Jiang
H Jiang
H Kim
H Richard
J Behr
J Du
J Feng
J Hu
J Lovén
J Martin
J Salzman
J Seok
J Seok
J Wu
J Wu
JE Allen
JJ Li
JP Venables
K Schneeberger
K Wang
KD Hansen
KF Au
KL Howe
KM Borgwardt
L Chen
L Chen
L Wang
L Wang
LY Chen
M Aschoff
M Fiume
M Garber
M Griffith
M Guttman
M Stanke
M Stanke
M Sultan
MC Ryan
MF Rogers
MG Grabherr
MH Schulz
MT Dimon
N Cloonan
N Cloonan
N Deng
N Leng
N Nicolae
N Philippe
N Vijay
NA Fonseca
O Stegle
P Drewe
P Glaus
PL Martelli
PP Labaj
Q Liu
Q Liu
Q Pan
QY Zhao
R Bohnert
R Guigó
R Li
S Anders
S Djebali
S Filichkin
S Heber
S Huang
S Lee
S Mangul
S Marco-Sola
S Shen
S Sonnenburg
S Srivastava
S Tang
S Zheng
SB Montgomery
SH Nagaraj
SK Lou
T Bonfert
TA Clark
TD Wu
TD Wu
W Li
W Li
W Wang
WJ Kent
Y Hu
Y Katz
Y Li
Y Liao
Y Surget-Groba
Y Xing
Y Xing
Y Zhang
Z Xia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/07/2015
Field of study

The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

arXiv.org e-Print Archive

Corset: enabling differential gene expression analysis for de novoassembled transcriptomes

Author: A Oshlack
Alicia Oshlack
B Langmead
B Li
BJ Haas
BZ Haznedaroglu
C Manning
C Soneson
C Trapnell
C Trapnell
CT Brown
DJ McCarthy
DR Zerbino
EA Hornett
F Ozsolak
G Pertea
G Robertson
H Li
I Nookaew
J Duan
J Martin
J Simpson
K Finstermeier
KL Ayers
L Fu
L Smeds
M Grabherr
M Robinson
MH Schulz
MJL De Hoon
N Vijay
Nadia M Davidson
Q-Y Zhao
R Garg
S Anders
S McGinnis
T Sandmann
VM Kvam
W Zhang
WJ Kent
WR Francis
Y Oono
Y Surget-Groba
Y Yang
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Public Library of Science (PLOS)

Patterns of Positive Selection and Neutral Evolution in the Protein-Coding Genes of Tetraodon and Takifugu

Author: A Christoffels
A Eyre-Walker
A Eyre-Walker
A Hobolth
AE Hirsh
AEO Trezise
AG Clark
C Kosiol
CD Derby
D Wang
Dirk Steinke
DP Wang
F Luo
G Amitai
GC Nickel
GH Jacobs
H Yang
HB Fraser
J Liu
JD Bloom
JD Bloom
JK Bowmaker
JM Young
Juan I. Montoya-Burgos
JZ Zhang
K Brcic-Kostic
K Ding
K Julenius
L Arbiza
LQ Zhang
LQ Zhang
M Hoffmann
M Przeworski
MS Springer
MV Han
N Patterson
O Jaillon
P Pamilo
PF Colosimo
PM Kim
Q Zheng
R Knight
R Nielsen
RA Gibbs
RR da Fonseca
SB Needleman
T Beissbarth
T Crnogorac-Jurcevic
T Ohta
TJP Hubbard
TS Mikkelsen
W Kai
WES Carr
WH Li
XH Zhang
Y Benjamini
Y Benjamini
Y Surget-Groba
Z Zhang
ZH Yang
ZH Yang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Recent genome-wide analyses have revealed patterns of positive selection acting on protein-coding genes in humans and mammals. To assess whether the conclusions drawn from these analyses are valid for other vertebrates and to identify mammalian specificities, I have investigated the selective pressure acting on protein-coding genes of the puffer fishes Tetraodon and Takifugu. My results indicate that the strength of purifying selection in puffer fishes is similar to previous reports for murids but stronger in hominids, which have a smaller population size. Gene ontology analyses show that more than half of the biological processes targeted by positive selection in mammals are also targeted in puffer fishes, highlighting general patterns for vertebrates. Biological processes enriched with positively selected genes that are shared between mammals and fishes include immune and defense responses, signal transduction, regulation of transcription and several of their descendent terms. Mammalian-specific processes displaying an excess of positively selected genes are related to sensory perception and neurological processes. The comparative analyses also revealed that, for both mammals and fishes, genes encoding extracellular proteins are preferentially targeted by positive selection, indicating that adaptive evolution occurs more often in the extra-cellular environment rather than inside the cell. Moreover, I present here the first genome-wide characterization of neutrally-evolving regions of protein-coding genes. This analysis revealed an unexpectedly high proportion of genes containing both positively selected motifs and neutrally-evolving regions, uncovering a strong link between neutral evolution and positive selection. I speculate that neutrally-evolving regions are a major source of novelties screened by natural selection

CiteSeerX

Archivio della ricerca - Università degli studi di Napoli Federico II

Archive ouverte UNIGE

The first transcriptome of Italian wall lizard, a new tool to infer about the Island Syndrome

Author: A Dobin
A Georges
A Pérez-Cembranos
AC Tzika
AL Ducrest
AM Bolger
B Li
B Morash
B Morash
BJ Haas
C Camacho
CAJ Janeway
D Fulgione
D Fulgione
D Wang
DL Mahler
DM Monti
Domenico Fulgione
E Quevillon
F Gambon-Deza
G Robertson
GA Leslie
GW Litman
J Jensen
J Klaczko
J Mistry
JB Losos
JB Losos
JB Losos
JE Wegener
JJ Marchanolis
JM Friedman
JS Fellah
K Sagonas
K Sagonas
KB Storey
MA Johnson
Maria Buglione
Martina Trapanese
MD Robinson
MF Flajnik
MF Flajnik
MG Grabherr
N Leng
P Chieffi
P Raia
Peng Xu
RD Smith-Unna
RE Faith
RL Burke
S Suzu
Serena Aceto
Simona Petrelli
SM Secor
T Wakayama
V Pérez-Mellado
Valeria Maselli
W Böhme
W Sanseverino
WL Eckalbar
Y Surget-Groba
YA Morris
Z Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

Some insular lizards show a high degree of differentiation from their conspecific mainland populations, like Licosa island lizards, which are described as affected by Reversed Island Syndrome (RIS). In previous works, we demonstrated that some traits of RIS, as melanization, depend on a differential expression of gene encoding melanocortin receptors. To better understand the basis of syndrome, and providing raw data for future investigations, we generate the first de novo transcriptome of the Italian wall lizard. Comparing mainland and island transcriptomes, we link differences in life-traits to differential gene expression. Our results, taking together testis and brain sequences, generated 275,310 and 269,885 transcripts, 18,434 and 21,606 proteins in Gene Ontology annotation, for mainland and island respectively. Variant calling analysis identified about the same number of SNPs in island and mainland population. Instead, through a differential gene expression analysis we found some putative genes involved in syndrome more expressed in insular samples like Major Histocompatibility Complex class I, Immunoglobulins, Melanocortin 4 receptor, Neuropeptide Y and Proliferating Cell Nuclear Antigen

Public Library of Science (PLOS)

FigShare

De Novo Transcriptome Sequencing in Anopheles funestus Using Illumina RNA-Seq Technology

Author: A Bateman
A Cohuet
A Cohuet
A Conesa
A Enayati
A Marchler-Bauer
A Mortazavi
AC Serazin
AL Toth
Alphonse Traoré
Antoine Sanou
Brian P. Lazzaro
C Costantini
C Trapnell
CS Wondji
D Charif
D Grimaldi
DJ Obbard
DR Zerbino
E Calvo
F Birzele
H Li
H Li
H Yassine
HE Hudson
I Maccallum
IV Sharakhov
J David
J Krzywinski
J Schultz
Jacob E. Crawford
JC Vera
JD Thompson
JT Simpson
Kenneth D. Vernick
LJ Collins
M Ashburner
M Coetzee
M Fraiture
MM Riehle
MT Gillies
MW Gaunt
N'Fale Sagnon
P Rice
Philip Awadalla
R Jiang
RA Holt
RH Hunt
RL Tatusov
RM Waterhouse
S Renaut
SF Altschul
TB Sackton
Wamdaogo M. Guelbeogo
X Wang
Y Surget-Groba
Z Wang
Publication venue: Public Library of Science
Publication date: 02/12/2010
Field of study

BACKGROUND: Anopheles funestus is one of the primary vectors of human malaria, which causes a million deaths each year in sub-Saharan Africa. Few scientific resources are available to facilitate studies of this mosquito species and relatively little is known about its basic biology and evolution, making development and implementation of novel disease control efforts more difficult. The An. funestus genome has not been sequenced, so in order to facilitate genome-scale experimental biology, we have sequenced the adult female transcriptome of An. funestus from a newly founded colony in Burkina Faso, West Africa, using the Illumina GAIIx next generation sequencing platform. METHODOLOGY/PRINCIPAL FINDINGS: We assembled short Illumina reads de novo using a novel approach involving iterative de novo assemblies and "target-based" contig clustering. We then selected a conservative set of 15,527 contigs through comparisons to four Dipteran transcriptomes as well as multiple functional and conserved protein domain databases. Comparison to the Anopheles gambiae immune system identified 339 contigs as putative immune genes, thus identifying a large portion of the immune system that can form the basis for subsequent studies of this important malaria vector. We identified 5,434 1:1 orthologues between An. funestus and An. gambiae and found that among these 1:1 orthologues, the protein sequence of those with putative immune function were significantly more diverged than the transcriptome as a whole. Short read alignments to the contig set revealed almost 367,000 genetic polymorphisms segregating in the An. funestus colony and demonstrated the utility of the assembled transcriptome for use in RNA-seq based measurements of gene expression. CONCLUSIONS/SIGNIFICANCE: We developed a pipeline that makes de novo transcriptome sequencing possible in virtually any organism at a very reasonable cost ($6,300 in sequencing costs in our case). We anticipate that our approach could be used to develop genomic resources in a diversity of systems for which full genome sequence is currently unavailable. Our An. funestus contig set and analytical results provide a valuable resource for future studies in this non-model, but epidemiologically critical, vector insect

Public Library of Science (PLOS)

HAL-Pasteur

De Novo Analysis of Transcriptome Dynamics in the Migratory Locust during the Development of Phase Traits

Locusts exhibit remarkable density-dependent phenotype (phase) changes from the solitary to the gregarious, making them one of the most destructive agricultural pests. This phenotype polyphenism arises from a single genome and diverse transcriptomes in different conditions. Here we report a de novo transcriptome for the migratory locust and a comprehensive, representative core gene set. We carried out assembly of 21.5 Gb Illumina reads, generated 72,977 transcripts with N50 2,275 bp and identified 11,490 locust protein-coding genes. Comparative genomics analysis with eight other sequenced insects was carried out to indentify the genomic divergence between hemimetabolous and holometabolous insects for the first time and 18 genes relevant to development was found. We further utilized the quantitative feature of RNA-seq to measure and compare gene expression among libraries. We first discovered how divergence in gene expression between two phases progresses as locusts develop and identified 242 transcripts as candidates for phase marker genes. Together with the detailed analysis of deep sequencing data of the 4th instar, we discovered a phase-dependent divergence of biological investment in the molecular level. Solitary locusts have higher activity in biosynthetic pathways while gregarious locusts show higher activity in environmental interaction, in which genes and pathways associated with regulation of neurotransmitter activities, such as neurotransmitter receptors, synthetase, transporters, and GPCR signaling pathways, are strongly involved. Our study, as the largest de novo transcriptome to date, with optimization of sequencing and assembly strategy, can further facilitate the application of de novo transcriptome. The locust transcriptome enriches genetic resources for hemimetabolous insects and our understanding of the origin of insect metamorphosis. Most importantly, we identified genes and pathways that might be involved in locust development and phase change, and may thus benefit pest management

The maternal and early embryonic transcriptome of the milkweed bug Oncopeltus fasciatus

Author: A Abzhanov
A Conesa
A Dorn
A Grimson
A Papanicolaou
A Rosenblueth
A Sarkar
AP Orth
B Ewen-Campen
B Ewing
B Ewing
Ben Ewen-Campen
C Camacho
C Nüsslein-Volhard
Cassandra G Extavour
CE Bruder
CJ Lowe
CL Hughes
D Bellin
D Erezyilmaz
D Gordon
D Lawson
DA Hahn
DF Erezyilmaz
DR Angelini
E Huebner
E Kristiansson
E Meyer
E Novaes
E Toulza
EA Bogdanova
EM Zdobnov
F Cheung
F Roeding
F Zhang
FH Butt
IAG Consortium
J Schmid
JA Bolker
JC Vera
JMW Slack
K Mita
KA Panfilio
KD Pruitt
Kristen A Panfilio
M Ashburner
M Kumé
MD Piulachs
MD Robinson
N Garcia-Reyero
Nathan Shaner
P Beldade
P Liu
P Liu
P Liu
PA Lawrence
PA Lawrence
PD Danley
R Nunes da Fonseca
RA Jenner
RE Timme
RJ Sommer
S Kumar
S Tweedie
SA Shabalina
SB Hedges
Siegfried Roth
ST O'Neil
TL Parchman
UniProt_Consortium
W Brockman
WN Beklemishev
WN Beklemishev
X Huang
Y Pauchet
Y Surget-Groba
Yuichiro Suzuki
YY Zhu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Most evolutionary developmental biology ("evo-devo") studies of emerging model organisms focus on small numbers of candidate genes cloned individually using degenerate PCR. However, newly available sequencing technologies such as 454 pyrosequencing have recently begun to allow for massive gene discovery in animals without sequenced genomes. Within insects, although large volumes of sequence data are available for holometabolous insects, developmental studies of basally branching hemimetabolous insects typically suffer from low rates of gene discovery. Results We used 454 pyrosequencing to sequence over 500 million bases of cDNA from the ovaries and embryos of the milkweed bug <it>Oncopeltus fasciatus</it>, which lacks a sequenced genome. This indirectly developing insect occupies an important phylogenetic position, branching basal to Diptera (including fruit flies) and Hymenoptera (including honeybees), and is an experimentally tractable model for short-germ development. 2,087,410 reads from both normalized and non-normalized cDNA assembled into 21,097 sequences (isotigs) and 112,531 singletons. The assembled sequences fell into 16,617 unique gene models, and included predictions of splicing isoforms, which we examined experimentally. Discovery of new genes plateaued after assembly of ~1.5 million reads, suggesting that we have sequenced nearly all transcripts present in the cDNA sampled. Many transcripts have been assembled at close to full length, and there is a net gain of sequence data for over half of the pre-existing <it>O. fasciatus </it>accessions for developmental genes in GenBank. We identified 10,775 unique genes, including members of all major conserved metazoan signaling pathways and genes involved in several major categories of early developmental processes. We also specifically address the effects of cDNA normalization on gene discovery in <it>de novo </it>transcriptome analyses. Conclusions Our sequencing, assembly and annotation framework provide a simple and effective way to achieve high-throughput gene discovery for organisms lacking a sequenced genome. These data will have applications to the study of the evolution of arthropod genes and genetic pathways, and to the wider evolution, development and genomics communities working with emerging model organisms. [The sequence data from this study have been submitted to GenBank under study accession number SRP002610 (<url>http://www.ncbi.nlm.nih.gov/sra?term=SRP002610</url>). Custom scripts generated are available at <url>http://www.extavourlab.com/protocols/index.html</url>. Seven Additional files are available.]</p

Kölner UniversitätsPublikationsServer

Harvard University - DASH

Springer - Publisher Connector